[2/4] Diffusion Quantized ckpt export #810

jingyu-ml · 2026-01-23T07:43:31Z

What does this PR do?

Type of change: New feature

Overview:

This MR adds HuggingFace checkpoint export support for LTX‑2 by treating TI2VidTwoStagesPipeline as a diffusion-like pipeline, exporting only the stage‑1 transformer (with QKV-fusion-enabled dummy inputs) and falling back to writing model.safetensors when save_pretrained isn’t available. It also preserves the original forward in DynamicModule patching (_forward_pre_dm) so downstream callers can still invoke the pre-patched forward implementation.

Changes

Added the calibration & quantization support of the LTX2, even with FP8 precision.
Preserve original forward before DynamicModule patching: when patching forward, we now stash the pre-patched implementation in self._forward_pre_dm (once) so downstream code can still call the original forward, then re-bind forward to the class implementation. This is needed for the LTX2 FP8 calibration.
Added LTX‑2 HF export path: export_hf_checkpoint() now also treats ltx_pipelines.ti2vid_two_stages.TI2VidTwoStagesPipeline as a “diffusion-like” object and routes it through _export_diffusers_checkpoint() (import guarded; no hard dependency).
Generalized component discovery: introduced get_diffusion_components() (aliasing the old get_diffusers_components) to support non-diffusers pipelines; for LTX‑2 it returns only stage_1_transformer.
Enabled QKV fusion for LTX‑2 backbone: added a model-aware dummy forward generator (generate_diffusion_dummy_forward_fn) that builds minimal LTX Modality inputs (including correct timesteps broadcasting) so shared-input hooks can run and fuse QKV when applicable.
Export fallback for non-save_pretrained modules: when a component lacks save_pretrained (LTX‑2 transformer), export now writes model.safetensors + minimal config.json instead of pytorch_model.bin.

Plans

[1/4] Add the basic functionalities to support limited image models with NVFP4 + FP8, with some refactoring on the previous LLM code and the diffusers example. PIC: @jingyu-ml
[2/4] Add support to more video gen models. PIC: @jingyu-ml
[3/4] Add test cases, refactor on the doc, and all related README. PIC: @jingyu-ml
[4/4] Add the final support to ComfyUI. PIC @jingyu-ml

Usage

python quantize.py --model ltx-2 --format fp4 --batch-size 64 --calib-size 1 --n-steps 40 --extra-param checkpoint_path=/home/scratch.omniml_data_2/jingyux/models/LTX-2/ltx-2-19b-dev-fp8.safetensors --extra-param distilled_lora_path=/home/scratch.omniml_data_2/jingyux/models/LTX-2/ltx-2-19b-distilled-lora-384.safetensors --extra-param spatial_upsampler_path=/home/scratch.omniml_data_2/jingyux/models/LTX-2/ltx-2-spatial-upscaler-x2-1.0.safetensors --extra-param gemma_root=/home/scratch.omniml_data_2/jingyux/models/LTX-2/gemma-3-12b-it-qat-q4_0-unquantized --extra-param fp8transformer=true --hf-ckpt-dir ./ltx2-nvfp4

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?:No
Did you add or update any necessary documentation?:No
Did you update Changelog?:No

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added LTX-2 video model support with complete quantization and export pipeline integration
- Introduced --extra-param CLI option for flexible model configuration and parameter passing
- Enhanced export capabilities with broader diffusion model compatibility
Chores
- Changed default model data type from Half to BFloat16 for improved numerical stability

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

codecov · 2026-01-24T00:36:01Z

Codecov Report

❌ Patch coverage is 57.57576% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.40%. Comparing base (aafd388) to head (bc3e5bb).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
...odelopt/torch/quantization/qtensor/mxfp8_tensor.py	25.00%	57 Missing ⚠️
.../torch/quantization/nn/modules/tensor_quantizer.py	22.22%	7 Missing ⚠️
modelopt/onnx/utils.py	86.95%	3 Missing ⚠️
modelopt/onnx/autocast/convert.py	84.61%	2 Missing ⚠️
modelopt/onnx/quantization/quantize.py	95.65%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #810      +/-   ##
==========================================
+ Coverage   73.31%   73.40%   +0.08%     
==========================================
  Files         192      193       +1     
  Lines       19613    19911     +298     
==========================================
+ Hits        14380    14616     +236     
- Misses       5233     5295      +62

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

examples/diffusers/quantization/models_utils.py

examples/diffusers/quantization/quantize.py

examples/diffusers/quantization/utils.py

Edwardf0t1 · 2026-01-26T01:01:08Z

modelopt/torch/export/unified_export_hf.py

+        else:
+            cpu_state_dict = {
+                k: v.detach().contiguous().cpu() for k, v in component.state_dict().items()
+            }
+            save_file(cpu_state_dict, str(component_export_dir / "model.safetensors"))
+            with open(component_export_dir / "config.json", "w") as f:
+                json.dump(
+                    {
+                        "_class_name": type(component).__name__,
+                        "_export_format": "safetensors_state_dict",
+                    },
+                    f,
+                    indent=4,
+                )


Can we combine these with L851 to L863? They look duplicated.

Why we need to offload tensors to cpu before saving?

if we always save with safetensors, keeping the .cpu() is the safe/default choice. this is also how the transformers/diffusers save_pretrained save the tensors to safetensors file.

Could you clarify more?

Can we combine these with L851 to L863? They look duplicated.

Line 880 saves the state dict to safe tensor, line 884 saves the quant config to config.json. we use these 2 function only if the model is not diffusers based.

cpu_state_dict = { k: v.detach().contiguous().cpu() for k, v in component.state_dict().items() } save_file(cpu_state_dict, str(component_export_dir / "model.safetensors")) with open(component_export_dir / "config.json", "w") as f: json.dump( { "_class_name": type(component).__name__, "_export_format": "safetensors_state_dict", }, f, indent=4, )

I mean this code block appears twice in the same script.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

modelopt/torch/export/unified_export_hf.py

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

modelopt/torch/opt/dynamic.py

modelopt/torch/quantization/nn/modules/quant_module.py

ChenhanYu

Commented on the dynamic module part.

Edwardf0t1

LGTM, left a few more comments.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml added 28 commits January 14, 2026 03:55

Your commit message describing all changes

a33cf13

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge the diffusion and llms layer fusion code

dff152b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Create a diffusers utils function, moved some functions to it

9e94843

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

db61c20

Fixed some bugs in the CI/CD

8a81723

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

16a2bbf

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Move one function to diffusers utils

68d5665

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

ace5773

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

removed the DiffusionPipeline import

95dfb52

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the example

302e2f4

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Fixed the CI/CD

8eed21b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the CI/CD

01d31d7

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the Flux example & address Chenjie's comments

ca3fdaa

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

use single line of code

44345f8

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the test case

78f12cc

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Add the support for the WAN video

3911a3d

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Moved the has_quantized_modules to quant utils

4cf9e76

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

moving model specific configs to separate files

1da2b46

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

eafedde

Fixed the CI/CD

3fb8320

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Fixed the cicd

372c6f7

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

reducee the repeated code

e67bf85

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

9b5cf13

Update the lint

e931fbc

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/diffusion.export-fixed

8b29228

Merge branch 'main' into jingyux/2-3-diffusion-export

b8b5eaf

Add the LTX2 FP8/BF16 support + Some core code changes

b717bae

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/2-3-diffusion-export

0d93e1a

jingyu-ml requested review from a team as code owners January 23, 2026 07:43

Merge branch 'main' into jingyux/2-3-diffusion-export

109c010

jingyu-ml changed the title ~~Jingyux/2 3 diffusion export~~ [2/4] Diffusion Quantized ckpt export Jan 23, 2026

jingyu-ml marked this pull request as ready for review January 23, 2026 22:36

jingyu-ml requested a review from a team as a code owner January 23, 2026 22:36

jingyu-ml added 2 commits January 23, 2026 23:57

Fixed the CICD

d7aef93

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Fixed more CICD

ac5fcd0

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1 reviewed Jan 26, 2026

View reviewed changes

jingyu-ml added 3 commits January 26, 2026 20:38

Merge branch 'main' into jingyux/2-3-diffusion-export

a96d58c

Update

e566834

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the example script

626ae02

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested a review from Edwardf0t1 January 26, 2026 23:00

jingyu-ml and others added 2 commits January 26, 2026 15:04

Merge branch 'main' into jingyux/2-3-diffusion-export

796c298

update the qkv fusion rules

9f0e998

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

cjluo-nv reviewed Jan 26, 2026

View reviewed changes

modelopt/torch/export/unified_export_hf.py Outdated Show resolved Hide resolved

jingyu-ml force-pushed the jingyux/2-3-diffusion-export branch from ef4f814 to 9f0e998 Compare January 27, 2026 09:28

Update

d3170f8

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested a review from cjluo-nv January 27, 2026 09:53

jingyu-ml commented Jan 27, 2026

View reviewed changes

modelopt/torch/opt/dynamic.py Show resolved Hide resolved

jingyu-ml commented Jan 27, 2026

View reviewed changes

modelopt/torch/opt/dynamic.py Show resolved Hide resolved

jingyu-ml commented Jan 27, 2026

View reviewed changes

modelopt/torch/quantization/nn/modules/quant_module.py Show resolved Hide resolved

ChenhanYu approved these changes Jan 27, 2026

View reviewed changes

Edwardf0t1 approved these changes Jan 27, 2026

View reviewed changes

jingyu-ml added 5 commits January 28, 2026 21:30

Update on the quantlinear & dynamic module

27e22bd

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/1.9-diffusion-export

7df2e21

Update the test case

54ef5d8

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Add the MR link

1226cce

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

update the test case

2acbb62

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested a review from shengliangxu January 30, 2026 01:27

merge the side branch

bc3e5bb

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2/4] Diffusion Quantized ckpt export #810

[2/4] Diffusion Quantized ckpt export #810

Uh oh!

jingyu-ml commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

codecov bot commented Jan 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 Jan 26, 2026

Uh oh!

jingyu-ml Jan 26, 2026

Uh oh!

jingyu-ml Jan 26, 2026 •

edited

Loading

Uh oh!

Edwardf0t1 Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChenhanYu left a comment

Uh oh!

Edwardf0t1 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[2/4] Diffusion Quantized ckpt export #810

Are you sure you want to change the base?

[2/4] Diffusion Quantized ckpt export #810

Uh oh!

Conversation

jingyu-ml commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

codecov bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

jingyu-ml Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

jingyu-ml Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChenhanYu left a comment

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jingyu-ml commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

codecov bot commented Jan 24, 2026 •

edited

Loading

jingyu-ml Jan 26, 2026 •

edited

Loading